Creating an Appropriate Corpus for PP Attachment Training
نویسنده
چکیده
This paper describes work in progress that is identifying shortcomings of existing Prepositional Phrase (PP) attachment algorithms and producing a new resource derived from the Penn TreeBank (PTB) corpus. The aim is to use this new resource (PTB Prime) to improve the accuracy of PP attachment algorithms and use this in an existing text processing system (LaSIE-II).
منابع مشابه
Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary
This paper deals with two important ambiguities of natural language: prepositional phrase attachment and word sense ambiguity. We propose a new supervised learning method for PPattachment based on a semantically tagged corpus. Because any sufficiently big sense-tagged corpus does not exist, we also propose a new unsupervised context based word sense disambiguation algorithm which amends the tra...
متن کاملA Flexible Unsupervised PP-Attachment Method Using Semantic Information
In this paper we revisit the classical NLP problem of prepositional phrase attachment (PPattachment). Given the pattern V −NP1−P−NP2 in the text, where V is verb,NP1 is a noun phrase, P is the preposition and NP2 is the other noun phrase, the question asked is where does P −NP2 attach: V or NP1? This question is typically answered using both the word and the world knowledge. Word Sense Disambig...
متن کاملCombining Unsupervised and Supervised Methods for PP Attachment Disambiguation
Statistical methods for PP attachment fall into two classes according to the training material used: first, unsupervised methods trained on raw text corpora and second, supervised methods trained on manually disambiguated examples. Usually supervised methods win over unsupervised methods with regard to attachment accuracy. But what if only small sets of manually disambiguated material are avail...
متن کاملDisambiguation of English PP Attachment using Multilingual Aligned Data
Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguist...
متن کاملUsing Parsed Corpora for Structural Disambiguation in the TRAINS Domain
This paper describes a prototype disambiguation module KANKEI which uses two corpora of the TRAINS project In ambiguous verb phrases of form V NP PP or V NP adverb s the two corpora have very di erent PP and adverb attachment patterns in the rst the correct attachment is to the VP of the time while in the second the correct attachment is to the NP of the time KANKEI uses various n gram patterns...
متن کامل